Pitch shifting is a common voice editing technique in which the original pitch of a digital voice is raised or lowered. It is likely to\nbe abused by the malicious attacker to conceal his/her true identity. Existing forensic detection methods are no longer effective for\nweakly pitch-shifted voice. In this paper, we proposed a convolutional neural network (CNN) to detect not only strongly pitchshifted\nvoice but also weakly pitch-shifted voice of which the shifting factor is less than ±4 semitones. Specifically, linear frequency\ncepstral coefficients (LFCC) computed from power spectrums are considered and their dynamic coefficients are extracted as the\ndiscriminative features. And the CNN model is carefully designed with particular attention to the input feature map, the activation\nfunction and the network topology. We evaluated the algorithm on voices from two datasets with three pitch shifting software.\nExtensive results show that the algorithm achieves high detection rates for both binary and multiple classifications.
Loading....